12 research outputs found

    Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource

    Full text link
    Word embeddings have recently seen a strong increase in interest as a result of strong performance gains on a variety of tasks. However, most of this research also underlined the importance of benchmark datasets, and the difficulty of constructing these for a variety of language-specific tasks. Still, many of the datasets used in these tasks could prove to be fruitful linguistic resources, allowing for unique observations into language use and variability. In this paper we demonstrate the performance of multiple types of embeddings, created with both count and prediction-based architectures on a variety of corpora, in two language-specific tasks: relation evaluation, and dialect identification. For the latter, we compare unsupervised methods with a traditional, hand-crafted dictionary. With this research, we provide the embeddings themselves, the relation evaluation task benchmark for use in further research, and demonstrate how the benchmarked embeddings prove a useful unsupervised linguistic resource, effectively used in a downstream task.Comment: in LREC 201

    Using Distributed Representations to Disambiguate Biomedical and Clinical Concepts

    Full text link
    In this paper, we report a knowledge-based method for Word Sense Disambiguation in the domains of biomedical and clinical text. We combine word representations created on large corpora with a small number of definitions from the UMLS to create concept representations, which we then compare to representations of the context of ambiguous terms. Using no relational information, we obtain comparable performance to previous approaches on the MSH-WSD dataset, which is a well-known dataset in the biomedical domain. Additionally, our method is fast and easy to set up and extend to other domains. Supplementary materials, including source code, can be found at https: //github.com/clips/yarnComment: 6 pages, 1 figure, presented at the 15th Workshop on Biomedical Natural Language Processing, Berlin 201

    A Short Review of Ethical Challenges in Clinical Natural Language Processing

    Full text link
    Clinical NLP has an immense potential in contributing to how clinical practice will be revolutionized by the advent of large scale processing of clinical records. However, this potential has remained largely untapped due to slow progress primarily caused by strict data access policies for researchers. In this paper, we discuss the concern for privacy and the measures it entails. We also suggest sources of less sensitive data. Finally, we draw attention to biases that can compromise the validity of empirical research and lead to socially harmful applications.Comment: First Workshop on Ethics in Natural Language Processing (EACL'17

    Embarrassingly Simple Unsupervised Aspect Extraction

    Get PDF

    Embarrassingly Simple Unsupervised Aspect Extraction

    Get PDF

    The produsing expert consumer : co-constructing, resisting and accepting health-related claims on social media in response to an infotainment show about food and nutrition

    Get PDF
    This article examines the Twitter and Facebook uptake of health messages from an infotainment TV show on food, as broadcasted on Belgium’s Dutch-language public broadcaster. The interest in and amount of health-related media coverage is rising, and this media coverage is an important source of information for laypeople, and impacts their health behaviours and therapy compliance. However, the role of the audience has also changed; consumers of media content increasingly are produsers, and, in the case of health, expert consumers. To explore how current audiences react to health claims, we have conducted a quantitative and qualitative content analysis of Twitter and Facebook reactions to an infotainment show about food and nutrition. We examine (1) to which elements in the show the audience reacts, to gain insight in the traction the nutrition-related content generates and (2) whether audience members are accepting or resisting the health information in the show. Our findings show that the information on health and production elicit the most reactions, and that health information incites a lot of refutation, low acceptance and a lot of suggestions on new information or new angles to complement the show’s information

    Embarrassingly Simple Unsupervised Aspect Extraction

    Get PDF
    We present a simple but effective method for aspect identification in sentiment analysis. Our unsupervised method only requires word embeddings and a POS tagger, and is therefore straightforward to apply to new domains and languages. We introduce Contrastive Attention (CAt), a novel single-head attention mechanism based on an RBF kernel, which gives a considerable boost in performance and makes the model interpretable. Previous work relied on syntactic features and complex neural models. We show that given the simplicity of current benchmark datasets for aspect extraction, such complex models are not needed. The code to reproduce the experiments reported in this paper is available at https://github.com/clips/catComment: Accepted as ACL 2020 short pape
    corecore